k-fold cross validation
Detecting Statistically Significant Fairness Violations in Recidivism Forecasting Algorithms
Machine learning algorithms are increasingly deployed in critical domains such as finance, healthcare, and criminal justice [1]. The increasing popularity of algorithmic decision-making has stimulated interest in algorithmic fairness within the academic community. Researchers have introduced various fairness definitions that quantify disparities between privileged and protected groups, use causal inference to determine the impact of race on model predictions, and that test calibration of probability predictions from the model. Existing literature does not provide a way in which to assess whether observed disparities between groups are statistically significant or merely due to chance. This paper introduces a rigorous framework for testing the statistical significance of fairness violations by leveraging k-fold cross-validation [2] to generate sampling distributions of fairness metrics. This paper introduces statistical tests that can be used to identify statistically significant violations of fairness metrics based on disparities between predicted and actual outcomes, model calibration, and causal inference techniques [1]. We demonstrate this approach by testing recidivism forecasting algorithms trained on data from the National Institute of Justice. Our findings reveal that machine learning algorithms used for recidivism forecasting exhibit statistically significant bias against Black individuals under several fairness definitions, while also exhibiting no bias or bias against White individuals under other definitions. The results from this paper underscore the importance of rigorous and robust statistical testing while evaluating algorithmic decision-making systems.
Evaluating K-Fold Cross Validation for Transformer Based Symbolic Regression Models
Kislay, Kaustubh, Singh, Shlok, Joshi, Soham, Dutta, Rohan, Flint, Jay Shim George, Zhu, Kevin
Symbolic Regression remains an NP-Hard problem, with extensive research focusing on AI models for this task. Transformer models have shown promise in Symbolic Regression, but performance suffers with smaller datasets. We propose applying k-fold cross-validation to a transformer-based symbolic regression model trained on a significantly reduced dataset (15,000 data points, down from 500,000). This technique partitions the training data into multiple subsets (folds), iteratively training on some while validating on others. Our aim is to provide an estimate of model generalization and mitigate overfitting issues associated with smaller datasets. Results show that this process improves the model's output consistency and generalization by a relative improvement in validation loss of 53.31%. Potentially enabling more efficient and accessible symbolic regression in resource-constrained environments.
Is K-fold cross validation the best model selection method for Machine Learning?
Gorriz, Juan M, Segovia, F, Ramirez, J, Ortiz, A, Suckling, J.
As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance and frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine learning classifications, such as accuracy, that do not have a parametric description. To approach a frequentist analysis within machine learning pipelines, a permutation test or simple statistics from data partitions (i.e. folds) can be added to estimate confidence intervals. Unfortunately, neither parametric nor non-parametric tests solve the inherent problems around partitioning small sample-size datasets and learning from heterogeneous data sources. The fact that machine learning strongly depends on the learning parameters and the distribution of data across folds recapitulates familiar difficulties around excess false positives and replication. The origins of this problem are demonstrated by simulating common experimental circumstances, including small sample sizes, low numbers of predictors, and heterogeneous data sources. A novel statistical test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed, where uncertain predictions of machine learning with CV are bounded by the \emph{worst case} through the evaluation of concentration inequalities. Probably Approximately Correct-Bayesian upper bounds for linear classifiers in combination with K-fold CV is used to estimate the empirical error. The performance with neuroimaging datasets suggests this is a robust criterion for detecting effects, validating accuracy values obtained from machine learning whilst avoiding excess false positives.
Implementing Custom GridSearchCV and RandomSearchCV without scikit-learn
Scikit-Learn offers two vehicles for optimizing hyperparameter tuning: GridSearchCV and RandomizedSearchCV. GridSearchCV performs an exhaustive search over specified parameter values for an estimator (or machine learning algorithm) and returns the best performing hyperparametric combination. So, all we need to do is specify the hyperparameters with which we want to experiment and their range of values, and GridSearchCV performs all possible combinations of hyperparameter values using cross-validation. As such, we naturally limit our choice of hyperparameters and their range of values. Theoretically, we can specify a set of parameter values for ALL hyperparameters of a model, but such a search consumes vast computer resources and time.
How to Evaluate the Performance of PyTorch Models - MachineLearningMastery.com How to Evaluate the Performance of PyTorch Models - MachineLearningMastery.com
Designing a deep learning model is sometimes an art. There are a lot of decision points and it is not easy to tell what is the best. One way to come up with a design is by trial and error and evaluating the result on real data. Therefore, it is important to have a scientific method to evaluate the performance of your neural network and deep learning models. In fact, it is also the same method to compare any kind of machine learning models on a particular usage.
Cross Validation. Cross-validation is a technique forโฆ
Cross-validation is a technique for evaluating a machine learning model and testing its performance. Cross-validation is a technique used to evaluate the performance of a machine learning model by training it on different subsets of the data and testing it on the remaining subset. Cross-validation is also known as rotation estimation or out-of-sample testing. Rotation estimation refers to the process of rotating, or splitting, the data into different subsets. Simply put, in the process of cross-validation, the original data sample is randomly divided into several subsets.
Self-Supervised Mental Disorder Classifiers via Time Reversal
Iqbal, Zafar, Mahmood, Usman, Fu, Zening, Plis, Sergey
Data scarcity is a notable problem, especially in the medical domain, due to patient data laws. Therefore, efficient Pre-Training techniques could help in combating this problem. In this paper, we demonstrate that a model trained on the time direction of functional neuro-imaging data could help in any downstream task, for example, classifying diseases from healthy controls in fMRI data. We train a Deep Neural Network on Independent components derived from fMRI data using the Independent component analysis (ICA) technique. It learns time direction in the ICA-based data. This pre-trained model is further trained to classify brain disorders in different datasets. Through various experiments, we have shown that learning time direction helps a model learn some causal relation in fMRI data that helps in faster convergence, and consequently, the model generalizes well in downstream classification tasks even with fewer data records.
Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation
Lawrence, Nick, Shen, Mingren, Yin, Ruiqi, Feng, Cloris, Morgan, Dane
The use of accurate scanning transmission electron microscopy (STEM) image simulation methods require large computation times that can make their use infeasible for the simulation of many images. Other simulation methods based on linear imaging models, such as the convolution method, are much faster but are too inaccurate to be used in application. In this paper, we explore deep learning models that attempt to translate a STEM image produced by the convolution method to a prediction of the high accuracy multislice image. We then compare our results to those of regression methods. We find that using the deep learning model Generative Adversarial Network (GAN) provides us with the best results and performs at a similar accuracy level to previous regression models on the same dataset.
Evaluate the Performance Of Deep Learning Models in Keras
Keras is an easy to use and powerful Python library for deep learning. There are a lot of decisions to make when designing and configuring your deep learning models. Most of these decisions must be resolved empirically through trial and error and evaluating them on real data. As such, it is critically important to have a robust way to evaluate the performance of your neural networks and deep learning models. In this post you will discover a few ways that you can use to evaluate model performance using Keras.
Methodology to Create Analysis-Naive Holdout Records as well as Train and Test Records for Machine Learning Analyses in Healthcare
Bennett, Michele, Nekouei, Mehdi, Mehta, Armand Prieditis Rajesh, Kleczyk, Ewa, Hayes, Karin
It is common for researchers to holdout data from a study pool to be used for external validation as well as for future research, and the same desire is true to those using machine learning modeling research. For this discussion, the purpose of the holdout sample it is preserve data for research studies that will be analysis-naive and randomly selected from the full dataset. Analysis-naive are records that are not used for testing or training machine learning (ML) models and records that do not participate in any aspect of the current machine learning study. The methodology suggested for creating holdouts is a modification of k-fold cross validation, which takes into account randomization and efficiently allows a three-way split (holdout, test and training) as part of the method without forcing. The paper also provides a working example using set of automated functions in Python and some scenarios for applicability in healthcare.